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(54) Electronically archiving and retrieving data objects 



(57) Methods, systems and computer programs for 
electronically archiving (400) and retrieving data objects 
(210): For archiving (400), data objects (210) are one- 
to-one converted (410) to markup objects (220). Each 
markup object (220) represents the data items (212) of 
the con-esponding data object (210). The markup ob- 
jects (220) are concatenated (430) to a single data struc- 



ture (200) that is byte addressable. Object identifteation 
(212) is Indexed (440) to addresses (205) of the data 
structure (200) for each markup object (220). Retrieving 
is performed In Inverse order. Furtherfeatures are: using 
XML, coding numerical items by characters, character 
set code Identification, compressing and expanding, 
and adding index and semantic descriptor to the struc* 
ture (200). 
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Description 

Field of the Invention 

[0001] The present invention generally relates to data s 
processing and, more particularly, relates to computer 
systems, programs, and methods for archiving and re- 
trieving data objects. 

Bacltground *o 

[0002] Public and private organizations such as com- 
panies or universities electronically access data by com- 
puters that implement applications, databases and ar- 
chives. Data is usually structured and technically repre- ^5 
sented by data objects. For example, a company stores 
business documents such as orders and invoices that 
have technically separate representations for address, 
product, currency, or monetary amount. 
[0003] Technical key problems relate to (a) data 20 
amount, (b) modifications, and (c) data access. 
[0004] Generally, applications write and read the ob- 
jects to and from the database. Due to huge amount of 
data ("mass data"), electronic archiving tools periodical- 
ly copy selected data from the databases to long-term 25 
electronic archives. Long-term refers to a term meas- 
ured in months, years or decades. Often the archiving 
terni is detemnlned by law. The archiving tools are often 
part of the application. The archive is Implemented by 
media and management software. Media are, for exam- 30 
pie, optical, magnetic or optomagnetic media (e.g., 
disks, tapes). 

[0005] Usually, the media is configured as WORM 
(write once, read many times) memory. 
[0006] Data selection for archiving purposes depends 35 
on a variety of well-known aspects. For example, the 
tool archives data objects for closed business transac- 
tions (such as paid invoices) but leaves data objects for 
ongoing business transactions in the database (such as 
unpaid invoices). ^0 
[0007] During an, archiving session, the toots archive 
sets of data objects rather then archiving single data ob- 
jects. Technically, the sets are archived as files. For min- 
imizing communication and storage overhead, adminis- 
trators have to optimize the file size. 
[0008] During the archiving term, the application and 
the archive (especially, its management software) are 
subject to various and often non-coordinated modifica- 
tions such as, for example, updating, upgrading, replac- 
ing, migrating to different platfomn or operating systems, so 
changing character code or numeric code, switching 
media, modernizing programming or retrieval languag- 
es and so on. Despite ongoing changes in application 
and archive tool, data must be preserved at any time 
and infomiation loss must be prevented. Infomiation Is S5 
lost when data or metadata (i.e. detennination of se- 
mantics) is lost. While an initial application writes a data 
object to an initial archive, later scenarios all challenge 



technical solutions: 

• a modified application retrieves the same data ob- 
ject from the initial database, 

• a modified application retrieves objects from a mod- 
ified archive, or 

• the still initial application retrieves objects from a 
modified archive. 

[0009] Occasionally, the modified application is com- 
pletely different from the initial one and is reduced to a 
retrieving tool. 

[0010] Turning to data retrieving (as the complement 
to archiving), the application or any other retrieving tool 
("requestor'*) needs to locate individual data objects (at 
random) and to read them from the archive within a time 
frame challenged by two conditions: reading from me- 
dium takes time (latency, transfer rate); and the tool (al- 
so the person using the toot) do only allow a maximum 
time. 

[0011] Further, data objects should be retrieved with- 
out superfluous data that causes undesired costs in 
temfis of time, memory, bandwidth and so on. 
[0012] These and other well known requirements to 
electronic archiving are often referred to by temis such 
as readability, platfomi independence, fonriat independ- 
ence, medium independence, data transfer efficiency, 
interpretabtlity and random access. 
[0013] Electronic archiving data objects is discussed 
in a variety of publications, such as, for example: 

• Schaarschmidt, Ralf : "Archivierung In 
Datenbanksystemen". Teubner. Reihe 
Wirtschaftsinfomnatik. B.G. Teubner Stuttgart, 
Leipzig, Wiesbaden. 2001. ISBN 3-619-00325-2. 

• Herisst, Axel: "Anwendungsorientiertes DB-Ar- 
chivleren". Springer Veriag Beriin Heidelberg New 
York 1 997. ISBN 3-540-63209-3. 

• Schaarschmidt, Ralf; Roder, Wolfgang: 
"Datenbankbasiertes Archivieren im SAP System 
R/3". Wirtschaftsinfonnatik 39 (1997) 6, pages 
469-477. 

• Jiirgen Gulbins, Markus Seyfried, Hans Strack-Zlm- 
mermann: "Dokumenten -Management", Springer 
Beriin 1998. ISBN 3-540-61595-4. 

[001 4] Archiving and retrieving tools are commercially 
available. For example, the Archive Development Kit 
(ADK) is such a proprietary tool for R/3 business appli- 
cations systems by SAP Aktiengesellschaft Walldorf 
(Baden), Germany. 

[001 5] There Is a need to provide method, system and 
computer program that accommodates (a) data amount, 
(b) modifications, and (c) data access. 

Summary of the Invention 

[0016] The present invention relates complementary 
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methods, systems and programs for electronically ar- 
chiving and retrieving data objects (DO). For archiving, 
a computer converts the data objects (DO) Into markup 
objects (MO), concatenates the markup objects (MO) to 
a data structure (DS) being a single byte addressable s 
file, and indexes object identification (01 D) to addresses 
for each markup object (MO). Retrieving Is perfomied 
essentially in the opposite order with conresponding 
steps looking up, reading and converting. 
[0017] Optional features to ensure interpretability are: io 

• using XML, 

• coding numerical items by characters. 

• character set code identification (CID, MiBenum), 

• compressing and expanding (MO to CO, back to '5 
MO, considering LID), and 

• adding Index (I) and semantic descriptor (D, DTD, 
XML schema) to the data structure. 

[0018] The present invention addresses technical key 20 
problems that relate to data amount, modifications, and 
data access. Modifications to application and archive 
are accommodated. 

Brief Description of the Drawings 25 
[0019] 

FIG. 1 illustrates a simplified block diagram of a 

computer system; 30 
FIG. 2 illustrates a simplified memory with memory 

portions and data structure (DS); 
FIG. 3 illustrates an exemplary data object (DO); 
FIG. 4 illustrates an exemplary markup object 

(MO); 35 
FIG. 5 illustrates an exemplary compressed object 

(CO); 

FIG. 6 illustrates a data structure with concatenated 
markup objects (MO); 

FIG. 7 illustrates the data structure (DS) with con- 40 
catenated compressed objects (CO); 

FIG. 8 illustrates an overview for an archiving meth- 
od by showing data objects (DO), markup 
objects (MO), the data structure (DS) and an 
index (I); ^5 

FIG. 9 illustrates a flowchart for the archiving meth- 
od; 

FIG. 10 illustrates aflowchart for a retrieving method; 
and 

FIG. 1 1 illustrates a hierarchy of a data table with ex- so 
emplary data objects (DO), as well as illus- 
trates an XML-file for the complete table and 
the Index (I). 

Computer System in General 55 

[0020] FIG. 1 illustrates a simplified block diagram of 
the inventive computer network system 999 having a 



plurality of computers 900, 901 , 902 (or 90q, with q=0... 
Q-1 , 0 any number). 

[0021] Computers 900-902 are coupled via inter-com- 
puter network 990. Computer 900 comprises processor 
91 0, memory 920, bus 930, and, optionally, input device 
940 and output device 950 (I/O devices, user interface 
960). As illustrated, the invention is present by computer 
program product 100 (CPP), program earner 970 and 
program signal 980, collectively "program". 
[0022] In respect to computer 900, computer 901/902 
Is sometimes referred to as "remote computer"', compu- 
ter 901/902 is, for example, a server, a router, a peer 
device or other common network node, and typically 
comprises many or all of the elements described relative 
to computer 900. Hence, elements 100 and 910-980 in 
computer 900 collectively illustrate also corresponding 
elements lOq and 91q-98q (shown for q=0) in comput- 
ers 90q. 

[0023] Computer 900 is. for example, a conventional 
personal computer (PC), a desktop and hand-held de- 
vice, a multiprocessor computer, a pen computer, a mi- 
croprocessor-based or programmable consumer elec- 
tronics, a minicomputer, a mainframe computer, a per- 
sonal mobile computing device, a mobile phone, a port- 
able or stationary personal computer, a palmtop com- 
puter or the like. 

[0024] Processor 910 is, for example, a central 
processing unit (CPU), a micro-controller unit (MCU), 
digital signal processor (DSP), or the like. 
[0025] Memory 920 symbolizes elements that tempo- 
rarily or pemianently store data and instructions. Al- 
though memory 920 is conveniently illustrated as part 
of computer 900, memory function can also be Imple- 
mented in network 990, In computers 901/902 and in 
processor 910 Itself (e.g., cache, register), or else- 
where. Memory 920 can be a read only memory (ROM), 
a random access memory (RAM), or a memory with oth- 
er access options. Memory 920 is physically implement- 
ed by computer-readable media, such as, for example: 
(a) magnetic media, like a hard disk, a floppy disk, or 
other magnetic disk, a tape, a cassette tape; (b) optical 
media, like optical disk (CD-ROM, digital versatile disk 
- DVD); (c) semiconductor media, like DRAM, SRAM, 
EPROM, EEPROM, memory stick, or by any other me- 
dia, like paper. 

[0026] Optionally, memory 920 is distributed across 
different media. Portions of memory 920 can be remov- 
able or non-removable. For reading from media and for 
writing in media, computer 900 uses devices well known 
in the art such as, for example, disk drives, tape drives. 
[0027] Memory 920 stores support modules such as, 
for example, a basic input output system (BIOS), an op- 
erating system (OS), a program library, a compiler, an 
interpreter, and a text- processing tool. Support modules 
are commercially available and can be Installed on com- 
puter 900 by those of skill in the art. For simplicity, these 
modules are not illustrated. 

[0028] CPP 1 00 comprises program instructions and 
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- optionally - data that cause processor 910 to execute 
method steps of the present invention. Method steps are 
explained with more detail below. In other words, CPP 
1 00 defines the operation of computer 900 and its inter- 
action in network system 999. For example and without 5 
the Intention to be limiting, CPP 1 00 can be available as 
source code in any programming language, and as ob- 
ject code ("binary code") in a compiled fomi. Persons of 
skill in the art can use CPP 1 00 in connection with any 
of the above support modules (e.g., compiler, Interpret- io 
er, operating system). 

[0029] Although CPP 1 00 is illustrated as being stored 
in memory 920, CPP 100 can be located elsewhere. 
CPP 100 can also be embodied In carrier 970. 
[0030] Carrier 970 is Illustrated outside computer 900. ^5 
For communicating CPP 100 to computer 900, carrier 
970 is conveniently inserted into input device 940. Car- 
rier 970 Is Implemented as any computer readable me- 
dium, such as a medium largely explained above (cf. 
memory 920). Generally, carrier 970 Is an article of man- 20 
ufacture comprising a computer readable medium hav- 
ing computer readable program code means embodied 
therein for executing the method of the present Inven- 
tion. Further, program signal 980 can also embody com- 
puter program 100. Signal 980 travels on network 990 25 
to computer 900. Having described CPP 100, program 
carrier 970, and program signal 980 In connection with 
computer 900 Is convenient. Optionally, program carrier 
971/972 (not shown) and program signal 981/982 em- 
body computer program product (CPP) 101/102 to be 30 
executed by processor 911/91 2 (not shown) in comput- 
ers 901/902. respectively. 

[0031] Input devtoe 940 symbolizes a device that pro- 
vides data and instructions for processing by computer 
900. For example, device 940 is a keyboard, a pointing 35 
device (e.g., mouse, trackball, cursor direction keys), 
microphone, joystick, game pad, scanner, disk drive. Al- 
though the examples are devices with human interac- 
tion, device 940 can also operate without human inter- 
action, such as, a wireless receiver (e.g., with satellite -^o 
dish or ten-estrial antenna), a sensor (e.g., a themiom- 
eter), a counter (e.g., goods counter in a factory). Input 
device 940 can serve to read carrier 970. 
[0032] Output device 950 symbolizes a device that 
presents instructions and data that have been proc- ^5 
essed. For example, a monitor or a display, (cathode ray 
tube (CRT), flat panel display, liquid crystal display 
(LCD), speaker, printer, plotter, vibration alert device. 
Similar as above, output devtee 950 communicates with 
the user, but it can also communfcate with furtheroom- so 
puters. 

[0033] Input device 940 and output device 950 can be 
combined to a single device; any device 940 and 950 
can be provided optional. 

[0034] Bus 930 and network 990 provide logical and 55 
physical connections by conveying instruction and data 
signals. While connections inside computer 900 are 
conveniently referred to as "bus 930". connections be- 



tween computers 900-902 are referred to as "network 
990". Optionally, network 990 comprises gateways be- 
ing computers that specialize in data transmission and 
protocol conversion. Devices 940 and 950 are coupled 
to computer 900 by bus 930 (as illustrated) or by network 
990 (optional). While the signals inside computer 900 
are mostly electrical signals, the signals in network are 
electrical, magnetic, optical or wireless (radio) signals. 
Networking environments (as network 990) are com- 
monplace in offices, enterprise-wide computer net- 
works, intranets and the Internet (i.e. world wide web). 
The physical distance between a remote computer and 
computer 900 is not important. Network 990 can be a 
wired or a wireless network. To name a few network im- 
plementations, network 990 is, for example, a local area 
network (LAN), a wide area network (WAN), a public 
switched telephone network (PSTN); a Integrated Serv- 
ices Digital Network (ISDN), an infra-red (I R) link, a radio 
link, like Universal Mobile Telecommunications System 
(UMTS), Global System for Mobile Communication 
(GSM), Code Division Multiple Access (CDMA), or sat- 
ellite link. 

[0035] Transmission protocols and data fonmats are 
known, for example, as transmission control protocol/ 
Internet protocol (TCP/IP), hyper text transfer protocol 
(HTTP), secure HTTP, wireless application protocol, 
unique resource locator (URL), a unique resource Iden- 
tifier (URI), hyper text markup language HTML, exten- 
sible markup language (XML), extensible hyper text 
markup language (XHTML), wireless application 
markup language (WML), Standard Generalized 
Markup Language (SGML) etc. Interfaces coupled be- 
tween the elements are also well known in the art. For 
simplicity, Interfaces are not Illustrated. An interface can 
be, for example, a serial port interface, a parallel port 
Interface, a game port, a universal serial bus (USB) in- 
terface, an internal or external modem, a video adapter, 
or a sound card Computer and program are closely re- 
lated. As used hereinafter, phrases, such as "the com- 
puter provides" and "the program provides", are conven- 
ient abbreviation to express actions by a computer that 
is controlled by a program. 

Detailed Description 

[0036] Computer system 999 of FIG. 1 is considered 
to have archive/retrieve computer 900, application com- 
puter 901 , and database computer 902. CPP 1 00 is con- 
sidered to be operating on computer 900. Implementing 
the present invention to the other computers is conven- 
ient as well. Terms are used as follows: "Retrieve" refers 
to reading objects from the archive. "Data object (DO)" 
stands for structured data as provided by the applica- 
tion. "Markup object (MO)" stands for an object in 
markup language. "Compressed object (CO)" stands for 
an object In a compressed fomn. "Descriptor" stands for 
any schema (also: "scheme") that Indicates the seman- 
tic of the mart<up language. " File" stands for a data struc- 
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ture with a plurality of addressable bytes. "Byte" stands 
for the smallest unit of infonrDation that Is discussed 
here. Usually, a byte has 8 bit. 
[0037] FIG. 2 Illustrates simplified memory 920 with 
memory portions 206 and data structure 200/201 . Mem- 
ory 920 has a plurality of byte addressable memory por- 
tions 208 (symbolized by lines). As Indicated by a bold 
frame, memory 920 (cf . FIG. 1 ) stores data structure 200 
(cf. 

[0038] FIG. 6). In an alternative embodiment, memory 
920 stores data structure 201 (cf. FIG. 7). Data struc- 
tures 200/201 are also byte addressable. 
[0039] FIG. 3 Illustrates exemplary data object (DO) 
21 0. Data object 21 0 has data Items 21 2 and Is identified 
by object identification (OID) 222 (e.g.. a key). For con- 
venience of explanation, a phone list is taken as an ex- 
ample. Application computer 901 and database compu- 
ter 902 (cf. FIG. 1 ) use a table with "name" and "phone" 
columns (data items 212-1 and 212-2). The exemplary 
data object 21 0 is the entry with the name "BETA" (Item 
212-1) and the phone number "123 456" (item 212-2). 
For convenience. FIG. 3 shows data object 210 by a 
bold frame. Using explicit OID is convenient; however, 
Implicit identification is sufficient. 
[0040] FIG. 4 illustrates exemplary markup object 
(MO) 220. Markup object 220 represents data items 212 
of oon-esponding data object 210. In other words, 
markup object 220 has been obtained by one-to-one 
conversion of item 212-1 (e.g., name) and item 212-2 
(e.g., phone) of data object 210. Representing in 
markup language is well known in the art. For example, 
the markup language is XML. As in the example, the 
language reads as <name="BETA" phone="123 456"> 
with items 212 (e.g., BETA, 123 466) and tag Identifiers 
therefore (e.g., <name=" " phone=" ">). FIG. 4 illus- 
trates markup object 220 by bytes (cf . FIG. 2) with N=30 
byte. 

[0041] Alternatives for the tag identifiers are, for ex- 
ample, <name>BETA</name> and <phone>123 466</ 
phone>. 

[0042] FIG. 5 illustrates exemplary compressed ob- 
ject (CO) 230. In the example, the tag identifiers of FIG. 
4 have been compressed to <1 > and <2>; data items 
21 2 are not compressed. The number of bytes has been 
reduced from N=30 to L=18. The first byte indicates L 
by length identification (LID) 224. 
[0043] Persons of skill in the art can select other com- 
pression techniques, for example Hutfmann coding. 
[0044] FIG. 6 Illustrates data structure 200 with con- 
catenated markup objects (MO) 220-1, 220-2 and 
220-3. For convenience, exemplary byte addresses (A) 
205 are indicated on the left side. Decimal numbers are 
used for conveniently counting; hexadecimal or other 
number systems are useful as well. 
[0045] Index (I) 250 and descriptor (D) 260 have ad- 
dresses 0001 to 0050 and 0051 to 0100, respectively. 
Markup object 220-1 has N=100 byte at addresses 
0101-0200; markup object 220-2 (cf. FIG. 4) has N=30 



byte at following addresses 0201-0230; markup object 
220-3 has N=70 byte at following addresses 0231 -0300. 
[0046] index 250 is a control block for storing this as- 
signment (OID to A). In other words, object Identlfteation 

5 01D=1 (for MO 220-1) has been indexed to A=0101; 
0ID=2 (for MO 220-2) has been indexed to A=0201 ; and 
01D=3 (for MO 220-3) has been indexed to A=0231 . The 
descriptor D represents the semantics of data items 21 2 
in markup objects 220, for example, by stating that the 

10 tag identifiers (cf. FIG. 4) stand for name and phone 
number. 

[0047] Optionally, two or more markup objects (MO) 
are coded by different character sets. Character sets are 
standardized by well-known organizations, such as ISO 

15 and J IS, or companies. For example, markup objects 
220-1 and 220-2 use Latin , but object 220-3 uses Cyrillic 
(or Greek, or Chinese, or Japanese, or Korean, or Ara- 
bte, etc.). FIG. 6 also illustrates that code identification 
(CID) 226 for markup object 220-3 has been added at 

20 addresses 0231-0232. 

[0048] While In the prior art (ADK), a single character 
set is valid for a complete data structure; the present 
Invention optionally distinguishes character sets for 
each object. CID 226 can be represented by text or by 

25 numbers. The Internet Assigned Numbers Authority 
(lANA) identifies character sets by unique Integer num- 
bers, the so-called "MIBenum" numbers (Management 
Infonnation Base). Using such standanj is convenient 
because code identification (CID) is interpretable with- 

30 out any further information. For example, CID 226 (for 
MO 220-3) is MIBenum "2084", technically represented 
by the bit patters 00001000 and 001 001 00 at the men- 
tioned addresses. 

[0049] FIG. 7 illustrates data structure 201 with con- 
35 catenated compressed objects (CO) 230. Similar as 
structure 200 In FIG. 6, structure 201 is byte addressa- 
ble. The objects are compressed objects (CO) 230 (cf. 
FIG. 5) each having length identification (LID) 224 (bold 
frames). For example, MO 220-1 with N=100 byte has 
40 been compressed to CO 230-1 with L=50 byte; MO 
220-2 with N=30 byte has been compressed to CO 
230-2 with L=1 8 byte; and MO 220-3 with N=70 byte has 
been compressed to CO 230-3 with L=40 yte. Length 
identification 224 indicates L for each CO 230, prefera- 
45 biy, at the beginning of each CO 230. 

[0050] FIG. 8 illustrates an overview for archiving 
method 400 by data objects (DO) 210, markup objects 
(MO) 220, data structure (DS) 200 and index (I) 250, as 
well as arrows for method steps converting 410, con- 
so catenating 430 and indexing 440. Index (I) 250 stores 
object identification (OID) 212 with corresponding ad- 
dresses 205 of data structure 200 for each markup ob- 
ject 220. 

[0051] FIG. 9 illustrates a flowchart for archiving 
55 method 400. Method 400 for electronically archiving a 
plurality of data objects 210 (having data items 212, cf. 
FIG. 3) comprises concatenating 430 objects 210 (i.e. 
as MO 220) to data structure 200 (byte addressable, cf . 
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FIGS. 2, 6) and Indexing 440 object identification 222 
for each object 21 0/220 to addresses (A) 205 of data 
structure 200. Prior to concatenating 430, the foilowing 
step is perfonmed: converting 410 the plurality of data 
objects 21 0 to a plurality of markup objects 220 by one- 5 
to-one conversion, wherein each markup object 220 
represents data items 21 2 of the corresponding data ob- 
ject 21 0. Useful and preferred features are indicated by 
bullet marks and a dashed frame. Preferably, during 
converting 410, markup objects 220 are provided in ex- 
tensible markup language (XML). Preferably, numerical 
data items 212 are encoded by character code. For ex- 
ample, the real number "2.6" is coded to a character- 
only string with the character "2", the period character, 
and the character "5". Preferably, during converting 410, 
code Identification (CID) 226 is added to some or all 
markup objects 220. Preferably, during converting 41 0, 
code identification 226 is represented by MIBenum 
numbers for character sets defined by lANA. (cf. FIG. 

6) . Preferably, after converting 41 0 but prior to concate- 
nating 430, compressing 420 compresses markup ob- 
jects (IVIO) 220 to compressed objects (CO) 230 with 
length identification (LID) 224 so that compressed ob- 
jects 230 are concatenated 430 to structure 201 (cf. FIG. 

7) . 

[0052] Preferably, during indexing, descriptor (D) 260 
is added to data structure 200. D represents the seman- 
tics of the data items 212 In the markup objects 220. 
Preferably, descriptor 260 is fomiulated in a document 
type definition (DTD) schema or In XML schema. 
[0053] Storing data structure 200 or 201 to media Is 
performed during or after method 400. Persons of skill 
in the art can accomplish this without the need of further 
explanation herein. 

[0054] Index 250 is - optionally - stored in a database 
separate from structure 200/201 . This approach en- 
hances efficiency. To ensure interpretability, descriptor 
260 should be stored as part of structure 200/201 . 
[0055] FIG. 10 illustrates a flowchart for retrieving 
method 500, Method 500 for electronically retrieving da- 
ta object 21 0 from byte addressable data structure 200 
for a given object identification 222 comprises the fol- 
lowing steps: looking up 51 0 the address (A) 205 (in data 
structure 200 or database) that con-esponds to the ob- 
ject identification 212; reading 520 maricup object 220 
at address (A) 205; converting 540 mari<up object 220 
to data object 210, wherein markup object 220 repre- 
sents data items 212 of corresponding data object 210. 
Preferably, method 500 retrieves from structure 201 (cf . 
[0056] FIG. 7). Prior to converting 540, compressed 
object (CO) 230 is expanded 530 to mari<up object (MO) 
220 by reading length identification (LID) 224 and read- 
ing compressed object (CO) 230 as a number of bytes 
(i.e. L byte) given by length identification (LID) 224. 
[0057] The optional features of method 500 con^e- 
spond to the features of method 400 (e.g., CID, MIBe- 
num, XML, descriptor D etc.). 
[0058] FIG. 11 illustrates a hierarchy of a data table 



with exemplary data objects (DO) as well as illustrates 
an XML-file for the complete table and index (I). The ta- 
ble has 3 objects, each for "name" and "phone". Below, 
the figure shows a corresponding XML-file with tags for 
the corhplete table and with object tags for object iden- 
tification, for "name" and for "phone". For convenience, 
closing tags (i.e., "</name" tags) and other well-known 
XML-statements are omitted. 
[0059] The prior art approach with archiving the XML- 
file and retrieving data items by an XML-parser would 
be time consuming. For a given object identification (e. 
g., OID 2), the parser would have to search forthe object 
identification tag by reading everything stored in front of 
the object to be retrieved (i.e., all tags of object 1). 
[0060j The index beiow shows advantages of the 
present invention. Retrieving Is expedited because the 
steps looking up 510 in the index, reading 520 from the 
address and converiting 540 (i.e. by a parser) do not re- 
quire parsing non-relevant objects. 
[0061] Having explained the present invention in con- 
nection with method 400 and 500 was convenient; ar- 
chive/retrieve computer 900 and computer program 
product (CPP) 1 00 can be implemented by those of skill 
in the art without the need of further explanation herein. 
Computer 900 has a plurality of means, each one for 
perfonning a method step. Likewise, computer instruc- 
tions of CPP 100 are provided for each method step. 
Method, system and CPP can also be implemented for 
other computers, such as application computer 901 or 
database computer 902 . Database backup was conven- 
iently not discussed here; the present Invention can be 
used for backup purposes as well. Deleting data objects 
from the database after archiving is well known in the 
art as well. 
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Reference Numbers 
[0062] 

40 100 computer program product (CPP) 

200 data structure with markup objects (MO) 

201 data structure with compressed objects (CO) 

205 addresses 

206 memory portions in data structure 
45 210 data object (DO) 

212 data item 

220 markup object (MO) 

222 object identification (OID) 

224 length identification (LID) 

50 226 code identification (CID) 

230 compressed object (CO) 

250 index (I) 

260 descriptor (D) 

55 400 archiving method 

410 converting 
420 compressing 
430 concatenating 
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440 


indexing 


4xx, 5xx method steps 


con 


reineving mcLriou 


510 


looking up 


620 


reading 


cart 
530 


expanaing 


640 


converting 


900 


archive/retrieve computer 


901 


application computer 


902 


database computer 


920 


memory 


999 


computer system 


9x0 


hardware components 


CID 


code identification 


CO 


compressed object 


D 


descriptor 


DO 


data object 


1. 


index 


lANA 


Intemet Assigned Numbers Authority 


ID 


Identification 


L 


number of bytes In CO 


LID 


length identification 


MIM 


Management Infomriation Base 


MO 


markup object 


N 


number of bytes in MO 


OID 


object identification 



Claims 

1. Method (400) for electronically archiving a plurality 
of data objects (210), the data objects (21 0) having 
data items (212), the method with concatenating 
(430) the objects (210/220) to a single data struc- 
ture (200) that is byte addressable and with index- 
ing (440) object identification (222) for each object 
(210/220) to addresses (205) of the data structure 
(200). the method (400) characterized In that prior 
to concatenating (430), the following step is per- 
fomried: converting (410) the data objects (210) to 
a plurality of markup objects (220) by one-to-one 
conversion, wherein each mari(up object (220) rep- 
resents the data items (212) of the conresponding 
data object (210). 

2. The archiving method (400) of claim 1 , wherein dur- 
ing converting (410), the markup objects (220) are 
provided in extensible markup language (XML). 

3. The archiving method (400) of claim 1 , wherein dur- 
ing converting (410), numerical data items (212) are 
encoded by character code. 

4. The archiving method (400) of claim 1 , wherein dur- 
ing converting (410), code identification (226) is 
added to markup objects (220). 



5. The archiving method (400) of claim 4, wherein dur- 
ing converting (410), code identification (226) is 
represented by MlBenum numbers for character 
sets by lANA. 

5 

6. The method (400) of claim 1 , wherein after convert- 
ing (410) but prior to concatenating (430), the fol- 
lowing step Is perfomied: compressing (420) the 
markup objects (220) to compressed objects (230) 

10 with length identification (224). 

7. The archiving method (400) of claim 1 , wherein a 
descriptor (260) representing the semantics of the 
data items (21 2) in the markup objects (220) is add- 

15 ed to the data structure (21 0). 

8. The archiving method (400) of claim 7, wherein the 
descriptor (260) is fomiulated in a document type 
definition (DTD) schema. 

20 

9. The archiving method (400) of claim 7, wherein the 
descriptor is formulated in an XML schema. 

10. Computer (900) for electronically archiving a plural- 
25 ity of data objects (21 0), the data objects (210) hav- 
ing data items (212), the computer (900) with 
means for concatenating (430) the objects 
(21 0/220) to a single data structure (200) that is byte 
addressable and with means for indexing (440) ob- 

30 ject identification (222) for each object (21 0/220) to 
addresses (205) of the data structure (200), the 
computer (900) characterized by means for con- 
verting (410) the data objects (210) to a plurality of 
markup objects (220) by one-to-one conversion, 

35 prior to concatenating (41 0), wherein each markup 
object (220) represents the data items (212) of the 
conresponding data object (210). 

1 1 . The computer (900) of claim 1 0, wherein the means 
40 for converting (410) provide the markup objects 

(220) in extensible markup language (XML). 

1 2. The computer (900) of claim 1 0, wherein the means 
for converting (410) encode numerical data items 

45 (212) by character code. 

1 3. The computer (900) of claim 1 0, wherein the means 
for converting (41 0) add code identification (226) to 
markup objects (220). 

50 

1 4. The computer (900) of claim 13, wherein the means 
for converting (410) represent code identification 
(226) by MlBenum numbers for character sets of 
lANA. 

55 

15. The computer (900) of claim 10, with means for 
compressing (420) the markup objects (220) to 
compressed objects (230) with length identification 
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(224), the means for compressing being activated 
after converting (410) but prior to concatenating 
(430). 

16. The computer (900) of claim 10, having means to 
add a descriptor (260) representing the semantics 
of the data items (212) in the marlcup objects (220) 
to the data structure (21 0). 

17. The computer (900) of claim 16, wherein the means 
to add a descriptor (260) uses a document type def • 
inition (DTD) schema. 

18. The computer (900) of claim 16, wherein the means 
. to add a descriptor (260) uses an Xiy^L schema. 

19. Computer program product (100) for electronically 
archiving a plurality of data objects (210), the data 
objects (21 0) having data items (21 2), the computer 
program product having Instructions that cause a 
computer (900) to concatenate (430) the objects 
(21 0/220) to a single data structure (200) that is byte 
addressable and to index (440) object identification 
(222) for each object (21 0/220) to addresses (205) 
of the data structure (200), the computer program 
product (400) characterized by further instructions 
forperfomiing prior to concatenating (430): convert- 
ing (410) the data objects (210) to a plurality of 
markup objects (220) by one-to-one conversion, 
wherein each markup object (220) represents the 
data items (212) of the corresponding data object 
(210). 

20. The computer program product (100) of claim 19, 
wherein the instructions for converting (410) cause 
the computer to provide the markup objects (220) 
in extensible markup language (XfAL). 

21. The computer program product (100) of claim 19, 
wherein the instructions for converting (410) cause 
the computer (900) to encode numerical data items 
(212) by character code. 

22. The computer program product (100) of claim 19, 
wherein the instructions for converting (410), cause 
the computer (900) to add code identification (226) 
to markup objects (220). 

23. The computer program product (100) of claim 22, 
wherein the instructions for converting (410) cause 
the computer (900) to represent code identification 
(226) by MlBenum numbers for character sets of 
lANA. 

24. The computer program product (100) of claim 19, 
with further instructions that cause the computer 
(900) to compress (420) the markup objects (220) 
to compressed objects (230) with length identifica- 



tion (224), the instructions for compressing being 
activated after converting (41 0) butpriorto concate- 
nating (430). 

5 25. The computer program product (100) of claim 19, 
with further instructions that cause the computer to 
add a descriptor (260) representing the semantics 
of the data items (212) in the markup objects (220) 
to the data structure (21 0). 

10 

26. The computer program product (100) of claim 25, 
wherein the Instmctions to add a descriptor (260) 
cause the computer (900) to use a document type 
definition (DTD) schema. 

IS 

27. The computer program product (100) of claim 25, 
wherein the instructions to add a descriptor (260) 
cause the computer (900) to use an XML schema. 

20 28. A method (500) for electronically retrieving a data 
object (21 0) from a byte addressable data structure 
(200) for a given object identification (212), the 
method (500) comprising the following steps: 

25 looking up (510) the address (205) that corre- 

sponds to the object identification (212); 
reading (520) the markup object (220) at the ad- 
dress (205); 

converting (540) the markup object (220) to the 
30 data object (210), wherein the markup object 

(220) represents the data items (212) of the cor- 
responding data object (210). 

29. The retrieving method (500) of claim 28, wherein 
35 priorto converting (540), a compressed object (230) 

Is expanded (530) to the markup object (220) by 
reading a length identification (224) and reading the 
compressed object (230) as a number of bytes giv- 
en by the length Identification (224). 

40 

30. Computer system for perfonning the method of 
claims 28 and 29. 

31. Computer program product (100) having instruc- 
ts tions that cause a computer to perfonn the method 

of claims 28 and 29. 
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